BY-COVID - WP5 - Baseline Use Case: SARS-CoV-2 vaccine effectiveness assessment

Validation

Compliance with the Common Data Model specification

We check whether the imported dataset complies with the data model specification (https://doi.org/10.5281/zenodo.6913045).

To comply with the data model, the dataset must pass a number of validation rules. The data are tested against this set of validation rules and results from this validation process are summarized.

Validation rule Name rule Items Passes Fails Percentage of fails Number of NAs Percentage of NAs Error Warning
is.na(age_nm) | age_nm - 5 >= -1e-08 & age_nm - 115 <= 1e-08 V01 650000 650000 0 0% 0 0% FALSE FALSE
is.na(sex_cd) | sex_cd %vin% c(0, 1, 2, 9) V02 650000 650000 0 0% 0 0% FALSE FALSE
is.na(dose_1_brand_cd) | dose_1_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) V03 650000 650000 0 0% 0 0% FALSE FALSE
is.na(dose_2_brand_cd) | dose_2_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) V04 650000 650000 0 0% 0 0% FALSE FALSE
is.na(number_doses) | number_doses - 0 >= -1e-08 & number_doses - 10 <= 1e-08 V05 650000 650000 0 0% 0 0% FALSE FALSE
fully_vaccinated_bl == FALSE | fully_vaccinated_bl == TRUE & !is.na(vaccination_schedule_cd) V06 650000 650000 0 0% 0 0% FALSE FALSE
is.na(test_type_cd) | test_type_cd %vin% c(“PCR”, “AG”, “other”) V07 650000 650000 0 0% 0 0% FALSE FALSE
is.na(variant_cd) | variant_cd %vin% c(“alpha”, “beta”, “gamma”, “delta”, “omicron”, “epsilon”, “zeta”, “eta”, “theta”, “iota”, “kappa”, “lambda”, “mu”) V08 650000 650000 0 0% 0 0% FALSE FALSE
is.na(pregnancy_bl) | pregnancy_bl == FALSE | (pregnancy_bl == TRUE & abs(sex_cd - 2) <= 1e-08 & age_nm - 12 >= -1e-08 & age_nm - 55 <= 1e-08) V09 650000 650000 0 0% 0 0% FALSE FALSE
is.na(essential_worker_bl) | essential_worker_bl == FALSE | (essential_worker_bl == TRUE & age_nm - 16 >= -1e-08 & age_nm - 70 <= 1e-08) V10 650000 647119 2881 0.44% 0 0% FALSE FALSE
(is.na(dose_1_dt) & is.na(dose_2_dt)) | is.na(dose_2_dt) | !is.na(dose_1_dt) & !is.na(dose_2_dt) & (dose_1_dt < dose_2_dt) V11 650000 650000 0 0% 0 0% FALSE FALSE
(is.na(dose_2_dt) & is.na(dose_3_dt)) | is.na(dose_3_dt) | !is.na(dose_2_dt) & !is.na(dose_3_dt) & (dose_2_dt < dose_3_dt) V12 650000 650000 0 0% 0 0% FALSE FALSE
is.na(previous_infection_dt) | is.na(confirmed_case_dt) | !is.na(previous_infection_dt) & !is.na(confirmed_case_dt) & (previous_infection_dt < confirmed_case_dt) V13 650000 650000 0 0% 0 0% FALSE FALSE
is.na(confirmed_case_dt) | is.na(exitus_dt) | !is.na(confirmed_case_dt) & !is.na(exitus_dt) & (confirmed_case_dt <= exitus_dt) V14 650000 649161 839 0.13% 0 0% FALSE FALSE
is.na(previous_infection_dt) | is.na(exitus_dt) | !is.na(previous_infection_dt) & !is.na(exitus_dt) & (previous_infection_dt <= exitus_dt) V15 650000 649957 43 0.01% 0 0% FALSE FALSE
is.na(fully_vaccinated_dt) | is.na(exitus_dt) | !is.na(fully_vaccinated_dt) & !is.na(exitus_dt) & fully_vaccinated_dt <= exitus_dt V16 650000 649362 638 0.1% 0 0% FALSE FALSE
(!is.na(dose_1_dt) & !is.na(dose_2_dt) & !is.na(dose_3_dt) & number_doses - 3 >= -1e-08) | (!is.na(dose_1_dt) & !is.na(dose_2_dt) & is.na(dose_3_dt) & abs(number_doses - 2) <= 1e-08) | (!is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(number_doses - 1) <= 1e-08) | (is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(number_doses - 0) <= 1e-08) V17 650000 636790 13210 2.03% 0 0% FALSE FALSE
is.na(dose_1_dt) | (!is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V18 650000 650000 0 0% 0 0% FALSE FALSE
is.na(dose_2_dt) | (!is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V19 650000 650000 0 0% 0 0% FALSE FALSE
is.na(dose_3_dt) | (!is.na(dose_3_dt) & !is.na(dose_3_brand_cd) & !is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V20 650000 650000 0 0% 0 0% FALSE FALSE

The vertical bars in the validation plot indicate the percentage of records ‘Passing’, ‘Failing’ and ‘Missing’

Non-compliance with the Common Data Model specification

The set of validation rules are considered ‘essential’ not to be violated to be considered for the subsequent analysis. A logical variable flag_violation_val is created in the cohort_data table in the BY-COVID-WP5-BaselineUseCase-VE.duckdb database and set to TRUE when at least one of the validation rules in the pre-specified set is violated (otherwise this variable is set to FALSE).

flag_violating_val==TRUE flag_violating_val==FALSE
17498 632502